Efficient and Adaptable Query Workload-Aware Management for RDF Data
نویسندگان
چکیده
The Resource Description Framework (RDF) is a flexible model for representing information about resources in the web. With the increasing amount of RDF data which is becoming available, efficient and scalable management of RDF data has become a fundamental challenge to achieve the Semantic Web vision. We present a flexible and adaptable approach for achieving efficient and scalable management of RDF using relational databases. The main motivation behind our approach is that several benchmarking studies have shown that each RDF dataset requires a tailored table schema in order to achieve efficient performance during query processing. We present a two-phase approach for designing efficient tailored but flexible storage solution for RDF data based on its query workload, namely: 1) a workload-aware vertical partitioning phase. 2) an automated adjustment phase that reacts to the changes in the characteristics of the continuous stream of query workloads. The aim of the vertical partitioning phase is to reduce the number of join operations in the query evaluation process while the adjustment phase aims to maintain the efficiency of the performance of the query processing by adapting the underlying schema to cope with the dynamic nature of the query workloads. We perform comprehensive experiments on two real-world RDF data sets to demonstrate that our approach is superior to the state-of-the-art techniques in this domain.
منابع مشابه
chameleon-db: a Workload-Aware Robust RDF Data Management System
The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard for the conceptual modeling of web resources, and SPARQL is the standard query language for RDF. As RDF is becoming more widely utilized, RDF data management systems are being exposed to workloads that are much more diverse and dynamic than they were designed to support, for which they are unable to provide c...
متن کاملWorkload Matters: Why RDF Databases Need a New Design
The Resource Description Framework (RDF) is a standard for conceptually describing data on the Web, and SPARQL is the query language for RDF. As RDF is becoming widely utilized, RDF data management systems are being exposed to more diverse and dynamic workloads. Existing systems are workload-oblivious, and are therefore unable to provide consistently good performance. We propose a vision for a ...
متن کاملData-Centric Schema Creation for RDF
Very recently, the vision of the Semantic Web has brought about new challenges in data management. One fundamental research issue in this arena is storage of the Resource Description Framework (RDF): the data model at the core of the Semantic Web. In this paper, we study a data-centric approach for storage of RDF in relational databases. The intuition behind our approach is that each RDF datase...
متن کاملWorkload-Aware RDF Partitioning and SPARQL Query Caching for Massive RDF Graphs stored in NoSQL Databases
Governments, corporations, startups, open data initiatives and other organizations are increasingly considering RDF and SPARQL in a broad range of information management scenarios. To reduce SPARQL querying times has been the main issue for virtually all the recent RDF triplestores, yet SPARQL caching techniques have not been broadly considered. In this paper we present Rendezvous, a middleware...
متن کاملClustering RDF Databases Using Tunable-LSH
The Resource Description Framework (RDF) is a W3C standard for representing graph-structured data, and SPARQL is the standard query language for RDF. Recent advances in Information Extraction, Linked Data Management and the Semantic Web have led to a rapid increase in both the volume and the variety of RDF data that are publicly available. As businesses start to capitalize on RDF data, RDF data...
متن کامل